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Abstract 

^\ • Short-term memory cannot in general be explained the way long-term 

memory can - as a gradual modification of synaptic conductances - since 
it takes place too quickly. Theories based on some form of cellular bista- 
bility, however, do not seem to be able to account for the fact that noisy 
neurons can collectively store information in a robust manner. We show 

^_l . how a sufficiently clustered network of simple model neurons can be in- 

r-s ' stantly induced into metastable states capable of retaining information 

• i— i . for a short time. Cluster Reverberation, as we call it, could constitute 

r^J ■ a viable mechanism available to the brain for robust short-term mem- 

rri ory with no need of synaptic learning. Relevant phenomena described by 

neurobiology and psychology, such as power-law statistics of forgetting 
avalanches, emerge naturally from this mechanism. 

>■ Keywords: Working memory, sensory memory, iconic memory, power-law 

forgetting, nonequilibrium neural networks. 

1 Slow but sure, or fast and fleeting? 

Of all brain phenomena, memory is probably one of the best understood [TJ [SJ [3] . 
Consider a set of many neurons, denned as elements with two possible states 
(firing or not firing, one or zero) connected among each other in some way by 
synapses which carry a proportion of the current let off by a firing neuron to 
its neighbours; the probability that a given neuron has of firing at a certain 
jJJ ■ time is then some function of the total current it has just received. Such a 

simplified model of the brain is able to store and retrieve information, in the 
form of patterns of activity (i.e., particular configurations of firing and non- firing 
neurons) when the synaptic conductances, or weights, have been appropriately 
set according to a learning rule [4] . Because each of the stored patterns becomes 
an attractor of the dynamics, the system will evolve towards whichever of the 
patterns most resembles the initial configuration. Artificial systems used for 
tasks such as pattern recognition and classification, as well as more realistic 
neural network models that take into account a variety of subcellular processes, 
all tend to rely on this basic mechanism, known as Associative Memory [5j [6] . 
Synaptic conductances in animal brains have indeed been found to become 
strengthened or weakened during learning, via the biochemical processes of long- 
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term potentiation (LTP) and depression (LTD) [7J[5]. Further support for the 
hypothesis that such a mechanism underlies long-term memory (LTM) comes 
from psychology, where it is being found more and more that so-called con- 
nectionist models fit in well with observed brain phenomena [21 [TO]- However, 
some memory processes take place on timescales of seconds or less and in many 
instances cannot be accounted for by LTP and LTD [TT], since these require at 
least minutes to be effected [T5J H3] • For example, Sperling found that visual 
stimuli are recalled in great detail for up to about one second after exposure 
(iconic memory) (14j : similarly, acoustic information seems to linger for three 
or four seconds (echoic memory) [15]. In fact, it appears that the brain actually 
holds and continually updates a kind of buffer in which sensory information 
regarding its surroundings is maintained (sensory memory) [16) . This is eas- 
ily observed by simply closing one's eyes and recalling what was last seen, or 
thinking about a sound after it has finished. Another instance is the capability 
referred to as working memory [11U17J : just as a computer requires RAM for its 
calculations despite having a hard drive for long term storage, the brain must 
continually store and delete information to perform almost any cognitive task. 
To some extent, working memory could consist in somehow labelling or bringing 
forward previously stored concepts, like when one is asked to remember a par- 
ticular sequence of digits or familiar shapes. But we are also able to manipulate 
- if perhaps not quite so well - shapes and symbols we have only just become 
acquainted with, too recently for them to have been learned synaptically. We 
shall here use short-term memory (STM) to describe the brain's ability to store 
information on a timescale of seconds or lesto 

Evidence that short-term memory is related to sensory information while 
long-term memory is more conceptual can again be found in psychology. For 
instance, a sequence of similar sounding letters is more difficult to retain for a 
short time than one of phonetically distinct ones, while this has no bearing on 
long-term memory, for which semantics seems to play the main role [T51 Q15] ; and 
the way many of us think about certain concepts, such as chess, geometry or 
music, is apparently quite sensorial: we imagine positions, surfaces or notes as 
they would look or sound. Most theories of short-term memory - which almost 
always focus on working memory - make use of some form of previously stored 
information (i.e., of synaptic learning) and so can account for the labelling 
tasks referred to above but not for the instant recall of novel information [501 
|2"T1 [551 155] , Attempts to deal with the latter have been made by proposing 
mechanisms of cellular testability: neurons are assumed to retain the state they 
are placed in (such as firing or not firing) for some period of time thereafter [24l 
[551 l2"fj] . Although there may indeed be subcellular processes leading to a certain 
bistability, the main problem with short-term memory depending exclusively on 
such a mechanism is that if each neuron must act independently of the rest the 
patterns will not be robust to random fluctuations [TT] - and the behaviour of 
individual neurons is known to be quite noisy [27] . It is worth pointing out that 



1 We should mention that sensory memory is usually considered distinct from STM — and 
probably has a different origin - but we shall use "short-term memory" generically since the 
mechanism we propose in this paper could be relevant for either or both phenomena. On the 
other hand, the recent flurry of research in psychology and neuroscience on working memory 
has lead to this term sometimes being used to mean short-term memory; strictly speaking, 
however, working memory is generally considered to be an aspect of cognition which operates 
on information stored in STM. 



one of the strengths of Associative Memory is that the behaviour of a given 
neuron depends on many neighbours and not just on itself, which means that 
robust global recall can emerge despite random fluctuations at an individual 
level. 

Something that, at least until recently, most neural network models have 
failed to take into account is the structure of the network - its topology - it 
often being assumed that synapses are placed among the neurons completely 
at random, or even that all neurons are connected to all the rest (a mathe- 
matically convenient but unrealistic situation). Although relatively little is yet 
known about the architecture of the brain at the level of neurons and synapses, 
experiments have shown that it is heterogeneous (some neurons have very many 
more synapses than others) , clustered (two neurons have a higher chance of be- 
ing connected if they share neighbours than if not) and highly modular (there 
are groups, or modules, with neurons forming synapses preferentially to those in 
the same module) [2H1 [22] ■ We show here that it suffices to use a more realistic 
topology, in particular one which is modular and/or clustered, for a randomly 
chosen pattern of activity the system is placed in to be metastable. This means 
that novel information can be instantly stored and retained for a short period of 
time in the absence of both synaptic learning and cellular bistability. The only 
requisite is that the patterns be coarse grained versions of the usual patterns - 
that is, whereas it is often assumed that each neuron in some way represents 
one bit of information, we shall allocate a bit to a small group or neuronal (four 
or five can be enough). 

The mechanism, which we call Cluster Reverberation, is very simple. If 
neurons in a group are more highly connected to each other than to the rest 
of the network, either because they form a module or because the network is 
significantly clustered, they will tend to retain the activity of the group: when 
they are all initially firing, they each continue to receive many action potentials 
and so go on firing, whereas if they start off silent, there is not usually enough 
input current from the outside to set them off. The fact that each neuron's 
state depends on its neighbours conferres to the mechanism a certain robustness 
in the face of random fluctuations. This robustness is particularly important 
for biological neurons, which as mentioned are quite noisy. Furthermore, not 
only does the limited duration of short-term memory states emerge naturally 
from this mechanism (even in the absence of interference from new stimuli) but 
this natural forgetting follows power-law statistics, as in experimental settings 
[301 HD [32] ■ 

The process is reminiscent both of block attractors in ordinary neural net- 
works [33] and of domains in magnetic materials [31] , while Mufioz et al. have 
recently highlighted a similarity with Griffiths phases on networks [33]. It can 
also be interpreted as a multiscale phenomenon: the mesoscopic clusters take 
on the role usually played by individual neurons, yet make use of network prop- 
erties. Although the mechanism could also work in conjunction with other ones, 
such as synaptic learning or cellular bistability, we shall illustrate it by consid- 
ering the simplest model which has the necessary ingredients: a set of binary 
neurons linked by synapses of uniform weight according to a topology whose 
modularity or clustering we shall tune. As with Associative Memory, this mech- 



2 This does not, of course, mean that memories are expected to be encoded as bitmaps. 
Just as with individual neurons, positions or orientations, say, could be represented by the 
activation of particular sets of clusters. 



anism of Cluster Reverberation appears to be simple and robust enough not to 
be qualitatively affected by the complex subcellular processes incorporated into 
more realistic neuron models - such as integrate-and-fire or Hodgkin-Huxley 
neurons. However, such refinements are probably needed to achieve graded per- 
sistent activity, since the mean frequency of each cluster could then be set to a 
certain value. 



2 The simplest neurons on modular networks 

We consider a network of N model neurons, with activities s, = ±1. The 
topology is given by the adjacency matrix a. b j — {1,0}, each element representing 
the existence or absence of a synapse from neuron j to neuron i (a need not 
be symmetric) . In this kind of model, each edge usually has a synaptic weight 
associated, u>ij € R; however, we shall here consider these to have all the same 
value: Wy = u> Vz,j. Neurons are updated in parallel (Little dynamics) at each 
time step, according to the stochastic transition rule 

P(«i ->±l)=±itanh^+i, 

where the field of neuron i is defined as 

N 
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and T is a parameter we shall call temperature. 

First of all, we shall consider the network defined by a to be made up of M 
distinct modules. To achieve this, we can first construct M separate random 
directed networks, each with n = N/M nodes and mean degree (mean number of 
neighbours) (k). Then we evaluate each edge and, with probability A, eliminate 
it, to be substituted for another edge between the original post-synaptic neuron 
and a new pre-synaptic neuron chosen at random from among any of those 
in other moduleaj. Note that this protocol does not alter the number of pre- 
synaptic neighbours of each node, kj n — J2j a ij (although the number of post- 
synaptic neurons, k° ut = J^j^jii can var y)- The parameter A can be seen 
as a measure of modularity of the partition considered, since it coincides with 
the expected value of the proportion of edges that link different modules. In 
particular, A = defines a network of disconnected modules, while A = 1 — M _1 
yields a random network in which this partition has no modularity. If A £ 
(1 — M~ , 1), the partition is less than randomly modular - i.e., it is quasi- 
multipartite (or multipartite if A = 1). 

If the size of the modules is of the order of (k), the network will also be 
highly clustered. Taking into account that the network is directed, let us define 
the clustering coefficient Ci as the probability, given that there is a synapse from 
neuron i to a neuron j and from another neuron I to i, that there be a synapse 
from j to I: that is, that there exist a feedback loop i — y j —y I — y i. Then, 



3 We do not allow self-edges (although these can occur in reality) since they can be regarded 
as a form of cellular bistability. 



assuming M ^> 1, the expected value of the clustering coefficient C = (Cj) is 

C> 1^—1(1 -A) 3 . 
n — 1 

3 Cluster Reverberation 

A memory pattern, in the form of a given configuration of activities, {^ = ±1}, 
can be stored in this system with no need of prior learning. Imagine a pattern 
such that the activities of all n neurons found in any module are the same, 
i.e., £j = £u(j), where the index fj,(i) denotes the module that neuron i belongs 
to. This can be thought of as a coarse graining of the standard idea of memory 
patterns, in which each neuron represents one bit of information. In our scheme, 
each module represents - and stores - one bit. The system can be induced into 
this configuration via the application of an appropriate stimulus (see Fig. [I}: 
the field of each neuron will be altered for just one time step according to 
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where the factor 8 is the intensity of the stimulus. This mechanism for dynami- 
cally storing information will work for values of parameters such that the system 
is sensitive to the stimulus, acquiring the desired configuration, yet also able to 
retain it for some interval of time thereafter. 




Figure 1: Diagram of a modular network composed of four five- neuron clusters. 
The four circles enclosed by the dashed line represent the stimulus: each is 
connected to a particular module, which adopts the input state (red or blue) 
and retains it after the stimulus has disappeared via Cluster Reverberation. 



The two main attractors of the system are Sj = 1 Vi and Si = —1 Vi. These 
are the configurations of minimum energy (see the next section for a more 
detailed discussion on energy). However, the energy is locally minimised for 
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\/i with dfj, — ±1; that is, configurations 



such that each module comprises either all active or all inactive neurons. These 
are the configurations that we shall use to store information. We define the 
mean activitju of each module, 
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which is a mesoscopic variable, as well as the global mean activity, 
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(these magnitudes change with time, but, where possible, we shall avoid writing 
the time dependence explicitely for clarity) . The extent to which the network, at 
a given time, retains the pattern {£,} with which it was stimulated is measured 
with the overlap parameter 
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Ideally, the system should be capable of reacting immediately to a stimulus by 
adopting the right configuration, yet also be able to retain it for long enough 
to use the information once the stimulus has disappeared. A measure of perfor- 
mance for such a task is therefore 
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where to is the time at which the stimulus is received and r is the period of time 
we are interested in (\r/\ < 1) [38]. If the intensity of the stimulus, 5, is very 
large, then the system will always adopt the right pattern perfectly and r\ will 
only depend on how well it can then retain it. In this case, the best network will 
be one that is made up of unconnected modules. However, since the stimulus 
in a real brain can be expected to arrive via a relatively small number of axons, 
either from another part of the brain or directly from sensory cells, it is more 
realistic to assume that S is of a similar order as the input a typical neuron 
receives from its neighbours, (h) ~ u)(k). 

Fig. [5]shows the mean performance obtained when the network is repeatedly 
stimulated with different randomly generated patterns. For low enough values 
of the modularity A and stimuli of intensity S > w(fc), the system can capture 
and successfully retain any pattern it is "shown" for some period of time, even 
though this pattern was in no way previously learned. For less intense stimuli 
(5 < u>(k}), performance is nonmonotonic with modularity: there exists an 
optimal value of A at which the system is sensitive to stimuli yet still able to 
retain new patterns quite well. 

It is worth noting that performance can also break down due to thermal 
fluctuations. The two main attractors of the system (s, = 1 Vi and Si = — 1 



4 The mean activity in a neural network model is usually taken to represent the mean firing 
rate measured in experiments [3]- 



Vz) suffer the typical second order phase transition of the Hopficld model [5], 
from a memory phase (one in which m = is not stable and stable solutions 
m =/= exist) to one with no memory (with m = the only stable solution), at 
the critical temperature [35] 



T r = ui- 



(k) 



(Note that, in a directed network, (km) = (k ou t) = (k), although the other 
moments can in general be different.) The metastable states we are interested 
in, though, have a critical temperature 

T' c = (1 - A)T C 

(assuming that the mean activity of the network is m ~ 0). That is, the 
temperature at which the modules are no longer able to retain their individual 
activity is in general lower than that at which the the solution m = for the 
whole network becomes stable. 
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Figure 2: Performance r\ against A for networks of the sort described in the main 
text with M — 160 modules of n = 10 neurons, (k) — 9; patterns are shown 
with intensities S = 8.5, 9 and 10, and T = 0.02 (lines - splines - are drawn as 
a guide to the eye). Inset: typical time series of m st i m (i.e., the overlap with 
whichever pattern was last shown) for A = 0.0, 0.25, and 0.5, and 8 = (k) = 9. 



4 Energy and topology 

Each pair of nodes contributes a configurational energy e^ 

that is, if there is an edge from i to j and they have opposite activities, the energy 

is increased in ^uj, whereas it is decreased by the same amount if their activities 



-w\{a,ij- 



~Q>ji)8i 



are the same. Given a configuration, we can obtain its associated energy by 
summing over all pairs. We shall be interested in configurations with x neurons 
that have s = +1 (and N — x with s = — f), chosen in such a way that one 
module at most, say p,, has neurons in both states simultaneously. Therefore, 
x = np + z, where p is the number of modules with all their neurons in the 
positive state and z is the number of neurons with positive sign in module p. 
We can write m = (2x — l)/iV and m M = (2z — l)/n. The total configurational 
energy of the system will be E = ^\. e^ = ^uj(L^ — (k)N), where L-j-j, is the 
number of edges linking nodes with opposite activities. By simply counting over 
edges, we can obtain the expected value of L^ (which amounts to a mean- field 
approximation because we are substituting the number of edges between two 
neurons for its expected value), yielding: 
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Fig. |3] shows the mean-field configurational energy curves for various values 
of the modularity on a small modular network. The local minima (metastable 
states) are the configurations used to store patterns. It should be noted that 
the mapping x — > m is highly degenerate: there are C^ M patterns with mean 
activity m that all have the same energy. 




Figure 3: Configurational energy of a network composed of M = 20 modules 
of n = 10 neurons each, according to Eq. ([T]), for various values of the rewiring 
probability A. The minima correspond to situations such that all neurons within 
any given module have the same sign. 



5 Forgetting avalanches 



In obtaining the energy we have assumed that the number of synapses rewired 
from a given module is always v = (k)n\. However, since each edge is evaluated 



with probability A, v will in fact vary somewhat from one module to another, 
being approximately Poisson distributed with mean (y) — (k)n\. The depth of 
the energy well corresponding to a given module is then, neglecting all but the 
first term in Eq. ([1} and approximating n — 1 ~ n, 

AS~ ju{n{k) -v). 

The typical escape time r from an energy well of depth AE at temperature T 
is t ~ e / [36j . Using Stirling's approximation in the Poissonian distribution 
of v and expressing it in terms of r, we find that the escape times are distributed 
according to 

P(t) ~ ( 1 - — ^- \nr) ' t~ p( - t \ (2) 



where 

P{r) = 1 + 
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Therefore, at low temperatures, P(t) will behave approximately like a power- 
law. The left panel of Fig. @] shows the distribution of time intervals be- 
tween events in which the overlap m M of at least one module /z changes sign. 
The power-law-like behaviour is apparent, and justifies talking about forgetting 
avalanches - since there are cascades of many forgetting events interspersed 
with long periods of metastability This is very similar to the behaviour ob- 
served in other nonequilibrium settings in which power-law statistics arise from 
the convolution of exponentials |37l [35] . 

It is known from experimental psychology that forgetting in humans is indeed 
well described by power-laws [301 EB EH] ■ The right panel of Fig. [4] shows the 
value of the exponent /3(t) as a function of r. Although for low temperatures 
it is almost constant over many decades of r - approximating a pure power-law 
- for any finite T there will always be a r such that the denominator in the 
logarithm of Eq. (JSJ approaches zero and j3 diverges, signifying a truncation of 
the distribution. 

6 Clustered networks 

Although we have illustrated how the mechanism of Cluster Reverberation works 

on a modular network, it is not actually necessary for the topology to have this 

characteristic - only for the patterns to be in some way "coarse-grained," as 

described, and that each region of the network encoding one bit have a small 

enough parameter A, defined as the proportion of synapses to other regions. For 

instance, for the famous Watts-Strogatz small-world model [35] - a ring of N 

nodes, each initially connected to its k nearest neighbours before a proportion 

p of the edges are randomly rewired - we have A ~ p (which is not surprising 

considering the resemblance between this model and the modular network used 

above). More precisely, the expected modularity of a randomly imposed box of 

n neurons is 
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Figure 4: Left panel: distribution of escape times t, as denned in the main 
text, for A = 0.22 and T — 0.02. Other parameters as in Fig. [2 Symbols from 
MC simulations and line given by Eqs. ([2]) and ((3j) - Right panel: exponent /3 
of the quasi-power-law distribution p(r) as given by Eq. ([3]) for temperatures 
T = 1 (red line), T = 2 (green line) and T = 3 (blue line). 

the second term on the right accounting for the edges rewired to the same box, 
and the third to the edges not rewired but sufficiently close to the border to 
connect with a different box. 

Perhaps a more realistic model of clustered network would be a random 
network embedded in <i-dimensional Euclidean space. For this we shall use the 
scheme laid out by Rozenfeld et al. |40j . which consists simply in allocating 
each node to a site on a d-torus and then, given a particular degree sequence, 
placing edges to the nearest nodes possible - thereby attempting to minimise 
total edge length_j. For a scale-free degree sequence (i.e., a set {ki} drawn from 
a degree distribution p(k) ~ fc -7 ) according to some exponent 7, then, as shown 
in Appendix 1, such a network has a modularity 
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d(7 - 2)1- 1 - r rf(7 ~ 2) 



(4) 



where I is the linear size of the boxes considered. 

Fig. [5] compares this expression with the value obtained numerically after 
averaging over many network realizations, and shows that it is fairly good - con- 
sidering the approximations used for its derivation. It is interesting that even in 
this scenario, where the boxes of neurons which are to receive the same stimulus 
are chosen at random with no consideration for the underlying topology, these 
boxes need not have very many neurons for A to be quite low (as long as the 
degree distribution is not too heterogeneous). 

Carrying out the same repeated stimulation test as on the modular networks 
in Fig. [5J we hnd a similar behaviour for the scale-free embedded networks. This 
is shown in Fig. [6j where for high enough intensity of stimuli S and scale-free 
exponent 7, performance can, as in the modular case, be r\ ~ 1. We should point 
out that for good performance on these networks we require more neurons for 



3 The authors also consider a cutoff distance, but we shall take this to be infinite here. 
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Figure 5: Proportion of outgoing edges, A, from boxes of linear size I against 
exponent 7 for scale- free networks embedded on 2D lattices. Lines from Eq. ((4]) 
and symbols from simulations with (k) — 4 and N = 1600. 
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Figure 6: Performance r\ against exponent 7 for scale-free networks, embedded 
on a 2D lattice, with patterns of M — 16 modules of n = 100 neurons each, 
(k) = 4 and N — 1600; patterns are shown with intensities S = 3.5, 4, 5 and 10, 
and T = 0.01 (lines - splines - are drawn as a guide to the eye). Inset: typical 
time series for 7 = 2, 3, and 4, with 5 = 5. 

each bit of information than on modular networks with the same A (in Fig. [5]we 
use n = 100, as opposed to n = 10 in Fig. [J). However, that we should be able 
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to obtain good results for such diverse network topologies underlines that the 
mechanism of Cluster Reverberation is robust and not dependent on some very 
specific architecture. In fact, we have recently shown that similar metastable 
memory states can also occur on networks which have random modularity and 
clustering, but a certain degree of assortativity^ [42] . 

7 Yes, but does it happen in the brain? 

As we have shown, Cluster Reverberation is a mechanism available to neural 
systems for robust short-term memory without synaptic learning. To the best 
of our knowledge, this is the first mechanism proposed which has these charac- 
teristics - essential for, say, sensory memory or certain working-memory tasks. 
All that is needed is for the network topology to be highly clustered or modu- 
lar, and for small groups of neurons to store one bit of information, as opposed 
to the conventional view which assumes one bit per neuron. Considering the 
enormous number of neurons in the brain, and the fact that real individual neu- 
rons are probably too noisy to store information reliably, these hypotheses do 
not seem farfetched. The mechanism is furthermore consistent both with what 
is known about the topology of the brain, and with experiments which have 
revealed power-law forgetting. 

Since the purpose of this paper is only to describe the mechanism of Clus- 
ter Reverberation, we have made use of the simplest possible model neurons 
- namely, binary neurons with static, uniform synapses - for the sake of clar- 
ity and generality. However, there is no reason to believe that the mechanism 
would cease to function if more neuronal ingredients were to be incorporated. In 
fact, cellular bistability, for instance, would increase performance, and the two 
mechanisms could actually work in conjunction. Similarly, we have also used 
binary patterns to store information. But it is to be expected that patterns de- 
pending on any form of frequency coding, for instance, could also be maintained 
with more sophisticated neurons - such that different modules could be set to 
different mean frequencies. 

Whether Cluster Reverberation would work for biological neural systems 
could be put to the test by growing such modular networks in vitro, stimulating 
appropriately, and observing the duration of the metastable states. In vivo 
recordings of neural activity during short-term memory tasks, together with 
a mapping of the underlying synaptic connections, could be used to ascertain 
whether the brain does indeed make use of this mechanism - although for this 
it must be borne in mind that the neurons forming a module need not find 
themselves close together in metric space. We hope that experiments such as 
these will be carried out and eventually reveal something more about the basis 
of this puzzling emergent property of the brain's known as thought. 
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Appendix 1 

The number of nodes within a radius r is n(r) = Adr d , with Ad a constant. 
We shall therefore assume a node with degree k to have edges to all nodes 
up to a distance r(k) — (k/Ad) 1 ^, and none beyond (note that this is not 
necessarily always feasible in practice). To estimate A, we shall first calculate 
the probability that a randomly chosen edge have length x. The chance that 
the edge belong to a node with degree k is ir(k) ~ kp(k) (where p(k) is the 
degree distribution). The proportion of edges that have length x among those 
belonging to a node with degree k is v(x\k) — dAdX d ~ 1 /k if AdX d < k, and 
otherwise. Considering, for example, scale-free networks (as in Ref. [40]), so 
that the degree distribution is p(k) ~ fc~ 7 in some interval k € [ko, k max ] |43j . 
and integrating over p(k), we have the distribution of lengths, 

/>k maJ: 

P(x) = {Const.) / ■K{k)v{k\x)dk = d( 7 - 2)x-Wi- 2) + l \ 

J max(/co : Ax d ) 

where we have assumed, for simplicity, that the network is sufficiently sparse 
that max(fco, Ax d ) — Ax d , Vx > 1, and where we have normalised for the 
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interval 1 < x < oo; strictly, x < (fcmaiM) 1 ''' 1 , but we shall also ignore this 
effect. Next we need the probability that an edge of length x fall between two 
compartments of linear size I. This depends on the geometry of the situation as 
well as dimensionality; however, a first approximation which is independent of 
such considerations is 

P__ JtI = min I 1 

I 



Pout{x) = min ( 1, 
We can now estimate the modularity A as 

X = [ Pout (x)P { x)dx = - !( -± w - i 



d(7 - 2)r x - l- d ^- 2 ) 



Fig. [5] shows how A depends on 7 for d — 2 and various box sizes. 
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